Skip to content

[https://nvbugs/5955188][fix] Fix harmony parsers for agentic coding use cases#12045

Open
dongfengy wants to merge 4 commits intoNVIDIA:mainfrom
dongfengy:user/dongfengy/aa_fix_harmony_main
Open

[https://nvbugs/5955188][fix] Fix harmony parsers for agentic coding use cases#12045
dongfengy wants to merge 4 commits intoNVIDIA:mainfrom
dongfengy:user/dongfengy/aa_fix_harmony_main

Conversation

@dongfengy
Copy link
Collaborator

@dongfengy dongfengy commented Mar 9, 2026

Summary by CodeRabbit

  • New Features

    • Enhanced tool call handling during streaming with improved validation and filtering logic
    • Reasoning content now preserved and propagated through streamed responses
  • Improvements

    • Improved end-of-stream processing for better consistency
    • Better parsing and validation of tool call recipients

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@dongfengy dongfengy requested a review from JunyiXu-nv March 9, 2026 17:04
@dongfengy dongfengy requested a review from a team as a code owner March 9, 2026 17:05
@dongfengy dongfengy requested a review from syuoni March 9, 2026 17:05
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 9, 2026

📝 Walkthrough

Walkthrough

This change adds comprehensive tool call handling to the harmony adapter's token streaming logic, including detection, validation, and argument accumulation across analysis and commentary channels. It enhances harmony-to-OpenAI format conversion with reasoning_content support, improves tool call recipient parsing with JSON validation, and refines end-of-stream processing.

Changes

Cohort / File(s) Summary
Tool Call Handling & Streaming
tensorrt_llm/serve/harmony_adapter.py
Introduces tool call detection and processing during token streaming with channel state management, argument accumulation across deltas, centralized filtering logic, and support for both function-style and non-function tool invocations. Enhances harmony-to-OpenAI mapping with reasoning_content threading through streaming paths. Improves tool call recipient parsing for constrained JSON payloads with validation. Refines end-of-stream handling to flush remaining tokens and maintains consistency across response construction and delta emission.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is a template with empty sections; no actual description, test coverage, or meaningful checklist completion provided. Fill in the Description section explaining the issue and solution, list relevant tests in Test Coverage, and complete the PR Checklist with specific details relevant to the changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title is specific and directly related to the main changes in the pull request, which focus on fixing harmony parsers to handle tool calls and agentic features for coding use cases.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/serve/harmony_adapter.py`:
- Around line 1613-1615: The final-chunk path skips emitting the initial
assistant-role SSE chunk when output.token_ids_diff is empty; update the logic
around harmony_adapter.create_openai_streaming_response so it always emits the
assistant-role/event chunk prior to sending finish/usage chunks even if
token_ids_diff is empty—i.e., factor out or call the same assistant-chunk
emission used in the non-done path before checking output.token_ids_diff,
ensuring the code paths at the blocks around output.token_ids_diff (the call to
harmony_adapter.create_openai_streaming_response and the subsequent finish/usage
sending blocks) both produce the initial assistant-role SSE event.
- Around line 1615-1620: The code calls
harmony_adapter.create_openai_streaming_response(...) and then unconditionally
appends a terminal final_response when done=True, which can duplicate a stop
chunk if create_openai_streaming_response already produced one; update the logic
around remaining_responses and final_response in the done=True branches (the
occurrences that use remaining_responses and final_response) to short-circuit:
check whether remaining_responses already contains a terminal/stop chunk (the
same marker your streaming responses use) and if so do not append another
final_response, and apply the same fix to the duplicate block later (the similar
code around lines 1632-1649) to avoid emitting duplicate terminal stop chunks.
- Around line 214-219: The current filtering in should_filter_tools + the other
site (and harmony_output_to_openai) wrongly blocks built-in tools like
"browser", "python", and "container" when external tools are present; implement
a single helper (e.g., is_tool_allowed(request_id, func_name, tool_choice,
available_tools)) and use it in both filter locations so that: if tool_choice ==
"none" it returns false for everything; if tool_choice != "none" it always
returns true for built-ins ("browser", "python", "container"); and for
non-built-in functions it checks membership against available_tools (and
respects should_filter_tools). Update calls in the places referencing
should_filter_tools / available_tools and harmony_output_to_openai to call this
helper instead.
- Around line 205-223: The current branch opens streamed tool calls for
parser.current_channel == "analysis" but the transition logic in
process_token_batch() and process_token_batch_to_messages() only deactivates
previous tool calls when prev_channel == "commentary", causing reuse/merging of
tool_id across analysis-invocations; update both transition handlers so they
retire (set active=False) prior tool calls when prev_channel is either
"commentary" or "analysis" (i.e., check prev_channel in
("commentary","analysis")) before creating/getting a new tool call via
_get_or_create_tool_call, ensuring parser.current_channel/
current_channel_state/_get_or_create_tool_call don't reuse an old tool_id from
the other channel.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b94c5807-7826-4021-a5ef-b2eedda0d888

📥 Commits

Reviewing files that changed from the base of the PR and between 7a68c42 and efffc01.

📒 Files selected for processing (1)
  • tensorrt_llm/serve/harmony_adapter.py

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
@dongfengy dongfengy force-pushed the user/dongfengy/aa_fix_harmony_main branch from efffc01 to 81f3318 Compare March 9, 2026 17:22
@dongfengy
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #38304 [ run ] triggered by Bot. Commit: 81f3318 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #38304 [ run ] completed with state SUCCESS. Commit: 81f3318
/LLM/main/L0_MergeRequest_PR pipeline #29679 completed with status: 'ABORTED'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Copy link
Collaborator

@JunyiXu-nv JunyiXu-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return {"should_stop": "Repeated message"}

# Check for tool calls first, regardless of channel.
# The model may emit tool calls on either "commentary" or "analysis" channel.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is kind of violating the harmony definition. https://github.com/openai/harmony/blob/main/docs/format.md#receiving-tool-calls

But if the model is behaving like this, we should WAR to unblock our work.

@JunyiXu-nv
Copy link
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #38380 [ run ] triggered by Bot. Commit: 246e773 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #38380 [ run ] completed with state SUCCESS. Commit: 246e773
/LLM/main/L0_MergeRequest_PR pipeline #29747 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants